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Abstract 

Background: There has been a growing interest in identifying context-specific active protein-protein interaction 
(PPI) subnetworks through integration of PPI and time course gene expression data. However the interaction 
dynamics during the biological process under study has not been sufficiently considered previously. 

Methods: Here we propose a topology-phase locking (TopoPL) based scoring metric for identifying active PPI 
subnetworks from time series expression data. First the temporal coordination in gene expression changes is 
evaluated through phase locking analysis; The results are subsequently integrated with PPI to define an activity 
score for each PPI subnetwork, based on individual member expression, as well topological characteristics of the 
PPI network and of the expression temporal coordination network; Lastly, the subnetworks with the top scores in 
the whole PPI network are identified through simulated annealing search. 

Results: Application of TopoPL to simulated data and to the yeast cell cycle data showed that it can more 
sensitively identify biologically meaningful subnetworks than the method that only utilizes the static PPI topology, 
or the additive scoring method. Using TopoPL we identified a core subnetwork with 49 genes important to yeast 
cell cycle. Interestingly, this core contains a protein complex known to be related to arrangement of ribosome 
subunits that exhibit extremely high gene expression synchronization. 

Conclusions: Inclusion of interaction dynamics is important to the identification of relevant gene networks. 



Background 

Life is a transient dynamic phenomenon. Biological 
functions and phenotypic traits, including disease traits, 
stem from the interactions across multiple scales in the 
living system. Therefore characterizing the condition- 
dependent interactions and emergent dynamics are 
important in the identification of relevant elements to a 
given biological process. 

Recently, a number of computational methods have 
been developed to identify the condition specific protein- 
protein interaction (PPI) subnetworks, through integration 
of generic PPI data (typically obtained from an interac- 
tome database) and condition-specific gene expression 
data [1]. For instance, by integrating yeast PPI networks 
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with gene expression data, Han et al. showed that some 
modules are active only at specific times and locations [2]. 
Qi et al. suggested that such approach enables the identifi- 
cation of subnetworks that are active under certain condi- 
tions [3]. In a cell cycle study by de Lichtenberg et al, it 
was found that the cell cycle-regulated and constitutively 
expressed proteins form protein complexes at particular 
time points during the cell cycle [4]. In these studies corre- 
lation in expression or similar measures are usually used 
to capture the condition specific gene interaction [3-9]. 
More recently, a number of studies focused on integration 
of PPI networks with time course expression data to iden- 
tify subnetworks that exhibit meaningful dynamic changes 
in transcription. In a study of yeast metabolic oscillation 
by Tang et al [5], the active PPI network is first con- 
structed for each time point (out of a total of 36 time 
points) through identification of interacting protein pairs 
whose corresponding genes exhibit a certain significant 
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pattern in expression at that time point. Then Markov 
clustering algorithm is applied to create candidate func- 
tional module of each network. These modules were 
found to have much more significant biological meaning 
than those derived using static PPI networks only [5]. In 
another study, Jin et al [6] defined a dynamic network 
module to be a set of proteins satisfying two conditions: 
(1) they form a connected component in the PPI network; 
and (2) their expression profiles exhibited time-shifted and 
local similarity patterns as evaluated using an time-warp- 
ing dynamic programming algorithm. Using yeast as a 
model system and time course expression data from mul- 
tiple experiments, they then showed that the majority of 
the identified dynamic modules are functionally homoge- 
neous, and many of them shed light on the sequential 
ordering of the molecular events in the cellular system of 
yeast [6]. 

Understanding cellular physiology from a dynamic and 
systems perspective is obviously very important and 
valuable as demonstrated by these studies and many 
others [10]. Incorporating time course data is a necessity 
along this direction. They not only capture how a whole 
system evolves over time, but also contain rich informa- 
tion regarding the coordination, namely, interaction, of 
the different elements in the system. The measurements 
from different time points are not independent of each 
other; this is in contrast to static measurements of dif- 
ferent samples, or of the same sample under different 
conditions. However, most of the existing studies either 
construct active networks independently at each time 
point [5], or rely on pattern similarity measures to infer 
interaction which ignores the inter-time point depen- 
dence [6]. Overlooking the interdependence among the 
time points not only loses sensitivity toward detecting 
relevant interactions but could also lead to erroneous 
predictions [11,12]. 

In this study we investigate the application of an idea 
rooted in statistical physics and non-linear dynamics to 
characterize the state of gene interaction networks and 
use it to identify relevant subnetworks. We regard active 
subnetworks to be those showing high degree of differ- 
ential expression, and high synchrony in expression 
changes (i.e., coordination in the timing of expression 
changes) among the members. The phase locking analy- 
sis will be utilized to evaluate expression synchrony, and 
to capture the dynamic interaction structure. Recently 
we found that the phase locking metric can identify 
interacting gene pairs more efficiently than correlation 

mi. 

Previously, we proposed a Pathway Connectivity Index 
(PCI) to represent the activity of pre-defined pathways, 
such as those defined in KEGG and Biocarta. PCI uti- 
lizes expression information of all genes in a pathway, 
as well as the topological properties of its interaction 



networks. Its advantages have been demonstrated [13]. 
This metric was later implemented in a software tool 
entitled Topological Analysis of Pathway-Phenotype 
Association (TAPPA). Here to capture contributions 
from topological characteristics of the dynamic interac- 
tion network, we integrate the phase locking analysis 
into PCI to define a novel metric: the Topology-Phase 
Locking (TopoPL) analysis [13]. With both simulated 
and real yeast expression data during cell cycle, we will 
demonstrate the merits of TopoPL. 

Methods 

Simulation study 

Simulation utilized the sample expression data gal80R 
given in Cytoscape (http://cytoscape.org/). There are 
331 genes and 361 interactions in this network. Within 
it, we randomly selected subnetworks at three different 
sizes n (n = 40, 60, 80), as condition-responsive. In each 
responsive subnetwork m% (80%, 90%, 100%) of genes 
are defined to be active. The significance values of active 
genes were assigned randomly with top n x m% signifi- 
cance values in gal80R, and that of the other genes were 
randomly sampled from the rest of the significance 
values. The phase locking index X (see 2.3) of the inter- 
actions in the predefined responsive subnetwork were 
sampled from N (0.8, 0.5) , i.e. a normal distribution 
with \i = 0.8, a = 0.5; while X for the remaining edges 
were sampled from N(0.4, 0.3). The choice of these 
values was based on the distribution of the X values of 
gene pairs in protein complexes and of randomly 
selected gene pairs. For protein complexes we used the 
MIPS annotation (http://mips.helmholtz-muenchen.de/ 
genre/proj/yeast) edited by Gerstein Lab (http://www. 
gersteinlab.org/proj/bottleneck/mips.txt). 

A gene of the predefined responsive subnetworks that 
is in the TopoPL-identified subnetwork is considered a 
successful identification. This procedure was repeated 
10 times and the true positive (TP, sensitivity) rate was 
defined to be the number of successful identifications 
divided by the size of the predefined network n. The 
false positive (FP, specificity or precision) rate was esti- 
mated as the number of false identifications divided by 
the size of the identified subnetwork. The F score is a 
measure of a test's accuracy. It considers both the preci- 
sion and the sensitivity of the test: 

specificity * sensitivity 

F = 

[specificity + sensitivity)/! 

We used the average sensitivity, specificity and F score 
to measure the performance of TopoPL. The perfor- 
mance is also evaluated with Receiver Operating Char- 
acteristic (ROC) curve, a plot of the true positive rate 
against the false positive rate [11]. 
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Gene expression and protein-protein interaction data 

Gene expression data was downloaded from EMBL's 
Huber group (http://www.ebi.ac.uk/huber-srv/scercycle/). 
It is a time course study of yeast cell cycle, where cells 
were arrested using alpha factor or cdc28. The alpha factor 
dataset contains 41 time points and the cdc28 dataset con- 
tains 44 time points, both at 5-minute resolution. These 
datasets provide strand-specific profiles of temporal 
expression during the mitotic cell cycle of S. cerevisiae, 
monitored for more than three complete cell divisions 
[14]. Yeast PPI data were downloaded from BioGRID (the- 
biogrid.org, version 3.1.69). 

Phase locking analysis 

The details of definitions and steps of the phase locking 
analysis was described in our previous work [11] and 
briefly summarized here. Given a time series s(t), its 
Hilbert transformation is given by 



5„(t) = ipV/^ ^-dr 

7t I — T 



(1) 



where PV stand for Cauchy Principal Value of integra- 
tion. The corresponding analytical signal can then be 
constructed by: 



5 (t) + is H (t) = A (f) e ] 



i<p{t) 



(2) 



where the instantaneous phase q> (t) is thus uniquely 
determined. For two time series with instantaneous 
phase <pi (t) and (Pj (t) , their cyclic relative phase is 
determined by 



V{f) = {<pi(t)-<pj(t))mod{27t) 



(3) 



If two time series interact with each other, there will 
be rhythmic adjustment resulting in phase locking: 
vli = \j/ 0 is a constant. To evaluate the significance of 
phase locking, we utilize the circular mean of the phase 
difference 



X = |exp(N> (t))| 



(4) 



In a perfect locking X = |exp(i^ 0 )| = 1> and X -> 0 
when (t) is randomly distributed. A offers a new 
measure to infer potential interaction between gene 
pairs [11]. 

TopoPL 

For each gene i, the EDGE software [15] was used to cal- 
culate pi , the significance of its expression changes dur- 
ing the time course study. We convert pi to a z-score 
through Zi = 0 _1 (1 —pi) } where 0 _1 is the inverse nor- 
mal CDF. Let A( p ) = (a^\j) be the adjacency matrix of 



genes in a PPI subnetwork and A = (a^) = {a^ p \j * Xy) , 
TopoPL defines the overall activity of a subnetwork with: 



TopoPL 



J2ieA 



ieA A^jeA I 



|0.5 . 



,0.5 



* sgn(zi + Zj) (5) 



TopoPL 



captures the dynamic topological property 
of the subnetwork, and hub genes (genes with high 
network degree) contribute more to this metric. 
|Zf| 0 - 5 * dij * |z/| 0 - 5 * sgn (zi + Zj) ,i f j can be regarded as 
the "activity measurement" of the interaction. Gene 
pairs with significant and synchronized expression 
changes, and whose gene products interact, contribute 
more to the activity of the subnetwork. 

This metric is an improved version over the PCI that we 
previously proposed to identify active pathways from gene 
expression data [13]: PCI = l x «l 0 ' 5 * fl y * M°' 5 * 5 M^* + x js), 
where x 15 is normalized log expression measurement of 
gene i in sample 5, and [dij) is the adjacency matrix of the 
PPI network of genes in the pathway. The merit of PCI has 
been demonstrated in previous works [13]. To reduce the 
potential impact on the network measure from residual 
inter-sample and inter-array biases after normalization, 
here we adopted the non-parametric measure Zi in place of 
Xi S . A similar metric to Eq. (5) was developed recently by us 
to predict candidate disease genes for type 1 diabetes, 
where Zi is the z-score of disease relevance of gene L There 
again we demonstrated the advantage of incorporating net- 
work structural information [16]. 

Obviously, z^ poPL increase with the number of nodes 
and edges. To adjust for network size and density, we 
use the following equation 



TopoPL 
Z A 



TopoPL 

z/ * 



(ttnodes + #edges) 



(6) 



We implemented the searching procedure based on 
simulated annealing. The pseudocode of the algorithm 
is described below: 

Input: the entire network Go = {V, E) ; a set of para- 
meters for running simulated annealing: start tempera- 
ture T start (= 1 in this study), end temperature T en d (= 
le-8 in this study), number of iterations N. 

Output: the subnetwork with the highest score. 

Steps: initialize each node with its expression signifi- 
cance score Zi and each edge with its phase locking 
index; select the largest connected component (subnet- 
work) G out from top 10% significant nodes of Go ; calcu- 
late score of G out and obtain its score z^ u \ ° PL > then run 
the following: 

For i = 1 to N, Do 

Calculate the current temperature T; = Tj * 0.8^ ; 
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Exit loop if T{ < T end 
Randomly pick a node n eV 
IF (n e Gtry), remove n from G try ; 
ELSE add n to G try ; 

Calculate score zJ^ oPL for the largest connected com- 
ponent of G try ; 

Calculate A =^ oPL -^r L ; 
IF A> 0, thenG 0Mt <- G try ; 

ELSE, accept G out <— G try with the probability 

p = e A ^; 

END 

These steps can be iterated to identify subnetworks 
with the next highest scores and so on. 

In this study we compared TopoPL with two other 
methods: (1) The commonly used network scoring 
method that sums significance levels of all genes in the 
network (hereafter referred to as the Additive scoring 
method): 

z Additive = XlteA^ (7) 

(2) A metric that we previously proposed in our 
TAPPA software package [13] (hereafter referred to as 
the TAPPA scoring method) that only utilize the topolo- 
gical characteristics of the PPI network: 

ZTopo = J2reA J2jeA * <&\ * ^ * Sg"fa + *j0 W 

Results 

Simulation study 

Using the simulated yeast gene expression data, we 
compared TopoPL with two other methods: (1) Additive 
scoring method (see definition Eq. (7) in Methods); and 
(2) TAPPA (see definition Eq. (8) in Methods) [13]. 
Additive does not use any structural information of the 
network, TAPPA uses only predefined static network 
structure ignoring the dynamic, condition-specific 
changes in interaction patterns. Figure 1 summarizes the 
average sensitivity, precision and F score from all simu- 
lated data: 10 replicates each of three network sizes (n - 
40, 60, 80), at three states of activity (m = 80%, 90%, 
100%). Though the three methods have similar sensitiv- 
ity, the precision of TopoPL is higher. F scores showed 
that TopoPL performs better than TAPPA and Additive. 
The ROC curves also indicate that TopoPL performs 
better than the other two approaches, with the highest 
Area Under Curve (AUC), as shown in Figure 2. 

Yeast cell cycle data 

After 100,000 iterations (N = 100,000), TopoPL identi- 
fied a subnetwork of 524 genes and 2078 edges with the 
alpha factor dataset (in the following sessions, unless 



specified otherwise, we only report results from the 
alpha factor dataset; the cdc28 dataset gives very similar 
results). We performed the GO term enrichment analy- 
sis with topGO package in Bioconductor (http://www. 
bioconductor.org) to investigate how well the identified 
subnetwork captured the relevant functional modules 
[17]. The most significant "Biological Process" GO 
terms are listed in Table 1. Many cell cycle, growth, and 
division-related processes were enriched in this subnet- 
work, such as GO:0042254 (ribosome biogenesis); 
GO:0007049 (cell cycle); GO:0022613 (ribonucleoprotein 
complex biogenesis); GO:0000278 (mitotic cell cycle); 
GO:0000280 (nuclear division). Almost all top terms are 
cell cycle related. Ribosomes are "factories" of protein 
synthesis, and synthesis of ribosomes is a key control 
point for the regulation of cell growth and division. 

Presently, there is no "gold standard" to evaluate the 
biological relevance of network modeling algorithms. 
Here we investigated the functional enrichment of the 
proteins in the identified subnetworks [9], and com- 
pared to that obtained using Additive and TAPPA. The 
p values (Bonferroni corrected) of the top 2 terms are 
3.33E-13 and 6.5E-12 with TAPPA, and 3.05E-8 and 
3.13E-8, with Additive, respectively. TAPPA's are slightly 
larger than TopoPL, but Additive gave much larger p 
values. This indicates that including interaction struc- 
ture, especially its dynamics, improves the sensitivity at 
identifying biologically relevant gene subnetworks. 

It has been demonstrated that hub genes and high 
betweenness genes (i.e. genes having high number of 
shortest paths passing through) play important roles in 
gene networks [18]. Table 2 listed the top 30 high- 
degree and high-betweenness nodes from the identified 
subnetwork. Though not been annotated with cell cycle, 
HEK2 is a RNA binding protein involved in asymmetric 
localization of the mRNA of ASH1, a transcription fac- 
tor that acts to specify daughter cell fate in mating-type 
switching [19]. Dsnl has been annotated with cell cycle, 
it is important for chromosome segregation in S. cerevi- 
siae [20]. TPK1 has been annotated with the cell cycle 
GO terms. It is a cAMP dependent protein kinase 
which mediates basic cellular processes, such as the 
yeast-to-hypha transition and cell cycle regulation [21]. 
NOP 15 is also annotated with cell cycle GO terms. The 
transcription level of NOP15 is an important determi- 
nant of the productivity of RNA and its increased tran- 
scription provides an effective approach to obtain higher 
RNA yields in yeast [22]. 

The top 30 high-degree and high-betweenness nodes 
from the identified subnetwork and their interactions 
are presented in Figure 3. We hypothesize that they 
constitute a relevance core to yeast cell cycle, and pro- 
vide a holistic picture of the primary molecular basis of 
cell cycle. In the core there are 18 genes annotated with 
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Figure 1 Performance of TopoPL, TAPPA and Additive. Three approaches have similar sensitivity, but TopoPL has higher precision. Results 
are from the simulated data. 



GO:0007049 cell cycle (round rectangles), this rate (18 
out of 39) is higher than that of the whole identified 
subnetwork (128 out of 524, a 1.9 fold enhancement, 
p = 0.11), and that of all genes in yeast (612 out of 
5286, p = 0.00013). These results suggest that degree 
and betweenness can be utilized to further improve the 
performance of functional gene module identification. 

We investigated the distribution of the phase locking 
index within the identified subnetwork. Clearly on aver- 
age there is a higher degree of phase locking in it than 
in the whole PPI network (Figure 4). Interestingly the 
synchronization in the core is even higher, indicating 
that these core genes may work more closely in a coor- 
dinated fashion than others in the identified subnetwork. 

Highly synchronized protein complex 

We further examined the highly synchronized regions in 
the network core. Figure 5 shows the top 20 most 




■TopoPL:AUC=0.954 
TAPPA:AUC=0.933 
■Additive:AUC=0.924 



0.4 0.6 
False positive rate 



0.8 



Figure 2 ROC plot of TopoPL TAPPA and Additive. TopoPL has 
the highest AUC. Results are from the simulated data. 



synchronized interactions (corresponding -1% of inter- 
actions in the identified subnetworks), MAK21 (NOC2) 
is at the center of this region. MAK21 is involved in 
preribosome export from the nucleus to the cytoplasm. 
Though it is not annotated with cell cycle GO term, but 
its homologue, SWA2 likely plays a role in ribosome 
biogenesis that is essential for the coordinated mitotic 
progression [23]. 

In protein complexes, the core components, which 
consist of two or more proteins that are present in most 
complex isoforms, are often regarded as functional units 
as they show surprisingly high degree of functional, 
essentiality, and localization homogeneity [24,25]. We 
therefore also surveyed protein complexes and core 
components in the identified subnetwork. We found 
that all core components in complex 56 are in our core 
subnetwork, and they are shown in Figure 6. Interest- 
ingly all six genes show extremely high synchronization 
(0.976±0.006, see Figure 4). Their expression profiles are 
given in Figure 7. We also included their expression 

Table 1 Top 10 GO Biological Processes terms 
significantly enriched in the subnetwork identified 
during yeast cell cycle. 



GO ID 


GO name 


P value 


GO:0042254 


ribosome biogenesis 


1.04E-13 


GO:0007049 


cell cycle 


9.31 E-1 3 


GO:0022613 


ribonucleoprotein complex biogenesis 


1.46E-12 


GO:0000278 


mitotic cell cycle 


2.07E-11 


GO:0000280 


nuclear division 


1 .00E-08 


GO:0022402 


cell cycle process 


2.81 E-08 


GO:0044085 


cellular component biogenesis 


3.00E-08 


GO:0051301 


cell division 


3.65E-08 


GO:0048285 


organelle fission 


5.13E-08 


GO:0006364 


rRNA processing 


1 .67E-07 



P values were Bonferroni corrected. 
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Table 2 Top 30 genes with highest degrees or betweenness in the identified subnetwork. 





Degree 






Betweenness 




Official Symbol 


degree 


Cell cycle? 


Official Symbol 


betweenness 


Cell cycle? 


HEK2 


1 55 




HEK2 


61 898 




DSN1 


76 


YES 


DSN1 


24078 


YES 


N0P1 5 


70 


YES 


TPK1 


12196 


YES 


CIC1 


60 




HSP82 


1061 2 




N0P7 


58 


YES 


\/ni 1/11/" 

YPL141C 


5798 




RRP5 


54 




ORC1 


5767 




N0C2 


54 




RRP5 


5641 




ERB1 


52 




KSS1 


5560 


YES 


RPF2 


52 




RAD53 


4074 


YES 


BRX1 


52 




DBF2 


3755 


YES 


NUG1 


51 




CLB2 


3698 


YES 


TPK1 


50 


YES 


CDC5 


321 8 


YES 


HAS1 


50 




NOP1 5 


2904 


YES 


N0P2 


50 




HHF1 


2902 




0RC1 


49 




BUD21 


2745 




NSA1 


49 




SHE2 


2650 




YTM1 


46 




SML1 


2432 


YES 


RLP7 


45 




RRP1 


2401 




RRP1 


44 




HHT1 


2376 


YES 


MRT4 


42 




HAS1 


2221 




HSP82 


40 




\/r~ d 1 o r\r~ 


2207 




DRS1 


38 




MPPl 0 


21 1 6 




IVIAlXZ 1 


jo 




JrU 1 Z 


1 941 


I LJ 


PUF6 


36 




CSM1 


1842 


YES 


N0P4 


36 




RCK1 


1808 


YES 


RAD53 


34 


YES 


RFA1 


1770 


YES 


RLP24 


34 




CDC20 


1737 


YES 


EBP2 


34 




ACE2 


1709 


YES 


RPF1 


32 




YAK1 


1706 




MPP10 


31 




CLN2 


1701 


YES 



profiles in the cdc28 dataset; again high synchronization 
in expression is evident. This means that they are coor- 
dinated to work closely during cell cycle. This is not 
surprising as a large percentage of protein pairs within 
the core subnetwork were coexpressed at the same time 
during cell cycle [24]. Our algorithm is naturally good at 
finding highly synchronized genes pairs, therefore tends 
to include more core components from the same 
complexes. 

Interestingly all six genes are annotated with 
GO:0042254 (ribosomal chaperone activity), it is defined 
as "A cellular process that results in the biosynthesis of 
constituent macromolecules, assembly, and arrangement 
of constituent parts of ribosome subunits; includes 
transport to the sites of protein synthesis". 



Transcription factor binding motif analysis 

We have found that genes regulated by the same tran- 
scriptional factors are likely to be highly synchronized 
[11]. Here to examine if the reverse is true, we used oPOS- 
SUM (http://opossum.cisreg.ca/oPOSSUM3/) to identify 
shared transcription factor binding sites (TFBS) among 
the genes in the identified subnetwork [26]. Given a group 
of genes, oPOSSUM first detects all TFBS documented in 
the JASPAR database in promoter regions (1000 bp 
upstream in this study), and then identifies overrepre- 
sented TFBS as compared to background gene sets (all 
genes in the PPI network in our study). It uses a simple 
binomial distribution model to compare the rate of occur- 
rence of a TFBS in the set of target genes to the expected 
rate estimated from the background set. Table 3 gives the 
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All SubNetwork SubNetwork Core complex 56 

Figure 4 Boxplot of phase locking index. Plotted are the mean X 
for all interacting gene pairs in PPI; in TopoPL identified 
subnetwork; in the subnetwork core with the top 30 high degree 
and high betweenness genes; and in protein complex 56. 



top 5 transcription factors of the identified subnetwork 
and its core. 

FKH1 and MCM1 are well studied cell cycle related 
transcriptional factors [27]. TOD6 (Pbfl) and DOT6 
(Pbf2) as PAC-binding factors, important in the 



:dc2c 



i 



DBF2 




Figure 5 Top 20 most synchronized Interactions. Rectangles 
denote cell cycle genes and thicker lines indicate higher 
synchronization 
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Figure 6 Interaction network of protein complex 56's core 
components. 



regulation of ribosome biogenesis. Existing ChlP-chip 
studies suggest that genes have the highest occupancy 
by TOD6 and DOT6 are highly enriched for the GO 
Biological Process "ribosome biogenesis" [28]. 

Agreement between the datasets 

A good algorithm should be efficient at uncovering the 
true biology underlying different datasets, which should 
be consistent. In this study, we identified 484 genes with 
the cdc28 dataset, and 524 genes with the alpha factor 
dataset. There are 156 (-31%) overlapping genes in them 
(p < 0.00001, Fisher Test). In contrast, there are only 87 
(-17%) overlapping genes with the Additive method 
(alpha: 501 genes; cdc28: 509 genes), and 145 (-29%) 
with TAPPA (alpha: 499 genes; cdc28: 503 genes). This 
indicates that incorporating network structural and 
dynamic information can generate robust results. 




Table 3 Transcription factor binding sites overrepresented in genes of the identified subnetwork and of its core. 

The identified subnetwork Core of the identified subnetwork 



TF 


gene hits 


gene non-hits 


All gene hits 


all non-hits 


Z-score 


TF 


gene hits 


gene non-hits 


All gene hits 


all non-hits 


Z-score 


DOT 


131 


390 2 


682 


4445 


38.7 


DOT6 


6 


0 


682 


4445 


25.6 


TOD 


116 


405 


639 


4488 


32.3 


TOD6 


4 


2 


639 


4488 


16.3 


FKH 


98 


423 


705 


4422 


17.0 


SFP1 


4 


2 


1203 


3924 


12.3 


SFP1 


153 


368 


1203 


3924 


15.3 


MGA1 


3 


3 


1320 


3807 


11.1 


MCM 


346 


175 


3125 


2002 


13.7 


STB3 


4 


2 


1139 


3988 


9.14 



1 : "gene hits" is the number of genes that contain the TFBS. 

2 : Note that the sum of columns 2 and 3 is 521, rather than 524, the total number of genes in the subnetwork. This is because that 3 out of the 524 genes do 
not have entries in oPOSSUM. 
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Conclusions 

TopoPL scoring method with a simulated annealing 
search was proposed in this study to identify active sub- 
networks during a biological process by integrating PPI 
with dynamic expression data. It incorporates both 
structural and dynamics information of gene interac- 
tions. When applied to the simulated data and the yeast 
cell cycle data, it yielded more consistent results from 
different experiments, and predicted more meaningful 
active network modules, than two alternative scoring 
methods that either ignores information of the network 
dynamics, or that of both the dynamics and structure. 
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